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ABSTRACT 

Colors play a particularly important role in both designing 
and accessing Web pages. A well- designed color scheme im- 
proves Web pages' visual aesthetic and facilitates user inter- 
actions. As far as we know, existing color assessment studies 
focus on images; studies on color assessment and editing for 
Web pages are rare. This paper investigates color assessment 
for Web pages based on existing online color theme-rating 
data sets and applies this assessment to Web color edit. This 
study consists of three parts. First, we study the extraction 
of a Web page's color theme. Second, we construct color as- 
sessment models that score the color compatibility of a Web 
page by leveraging machine learning techniques. Third, we 
incorporate the learned color assessment model into a new 
application, namely, color transfer for Web pages. Our study 
combines techniques from computer graphics, Web mining, 
computer vision, and machine learning. Experimental re- 
sults suggest that our constructed color assessment models 
are effective, and useful in the color transfer for Web pages, 
which has received little attention in both Web mining and 
computer graphics communities. 

Categories and Subject Descriptors 

H. 4.m [Information Systems]: Miscellaneous; H.2.8 [Data- 
base Applications]: Data Mining 

General Terms 

Algorithms 

Keywords 

Color assessment, Color transfer, Web mining. Transfer learn- 
ing. 

I. INTRODUCTION 

Humans generally prefer certain colors over others, and 
this preference influences a wide range of human behaviors. 



such as buying cars, choosing clothes, etc. [19^. Colors are 
also crucial to the success of Web pages and directly affect 
the perception of their aesthetic and usability [16] [24] . There 
has been a large amount of work on Web page colors. Kon- 
dratova and Goldfarb [14 conducted studies on color prefer- 
ences in Web design for a number of countries and identified 
several country-specific color palettes. Thorlacius [26] inves- 
tigated the effects of visual factors, including colors, typog- 
raphy, and pictures in a Web page, on the users' interaction 
with that page. Coursaris et al. [6^ studied the effects of 
color temperature (cool or warm) on Web aesthetics. Their 
findings suggest that pages with a warm primary color (e.g., 
red) and a warm secondary color (e.g., orange) are the least 
aesthetically pleasing. These existing studies focus either on 
exploring color designing rules or taking colors as an impor- 
tant factor in the evaluation of Web design. So far, little 
work has been done on the direct evaluation of Web colors, 
despite the fact that an effective color assessment tool would 
be useful for many Web-related applications, especially Web 
appearance design. 

Most recently, O 'Donovan et al. [18 initiated a pilot work 
on the construction of assessment models for color compati- 
bility using online data sets that consist of color themes and 
their associated compatibility ratings. These color themes 
were created by color experts and the compatibility ratings 
were the results of votes cast by viewers (users). The con- 
structed regression model proposed by [18 can score a color 
theme, and their classification model predicts whether or 
not a color theme is compatible. They proposed several po- 
tential image editing applications, such as color theme op- 
timization and color suggestion. The experiments demon- 
strated the usefulness of their constructed color assessment 
modes in image editing. 

The above work motivated us to investigate the assess- 
ment of Web page in terms of color compatibility. Web page 
designers usually create a small number of colors. Therefore, 
an intuitive way to apply Web color assessment is to follow 
the approach proposed by O'Donovan et al. [18 , that is, to 
obtain the color theme of a Web page, and then to assess its 
compatibility based on learned models. However, we found 
this approach inappropriate for direct use with Web pages, 
as there are obvious differences between a Web page and an 
image. A Web page, as shown in Fig. 1, has several areas- 
denoted as temporal part-that display visual contents, such 
as image, flash, video, etc. The colors found in the temporal 
part change with the content. The areas outside the tem- 
poral part are called fixed. As the colors in temporal parts 
are changed from time to time and colors in fixed parts are 




the focus in Web design, this work only assesses the colors 
of the fixed part. In contrast, studies on image colors do not 
require distinguishing fixed from temporal parts. Therefore, 
further study for Web color assessment is required. 

The primary goal of our study is to construct a color as- 
sessment model for Web pages. Thus, a new construction 
framework for a Web color assessment model is proposed. 
Several new algorithms are introduced to address the key 
steps in the construction framework. 

The current study applies the learned color assessment 
model in a new color editing application, namely, color trans- 
fer for Web pages. As in color transfer between images, given 
a source Web page and a reference Web page, a color trans- 
fer algorithm transforms the colors of the source Web page, 
such that the colors of the transferred source Web page be- 
come similar to those of the reference page. An automatical 
color transfer and assessment framework is presented in this 
study. This framework can help Web designers to choose 
Web colors. The new application may increase exchanges 
between Web mining and computer graphic communities to 
develop more techniques for computer aided Web design. 

2. RELATED WORK 
2.1 Color assessment 

Current color assessment focuses on assessing the compat- 
ibility of the combination of different colors. Color compat- 
ibility is related to human color preference, and high color 
compatibility indicates high color preference. There have 
been numerous studies on color compatibility assessment. 
Existing studies can be divided into three categories listed 
below. 

1. Single factor-based methods. These studies are based 
on color compatibility tools, such as color wheels. For 
example, Goethe |12j pointed out that contrasting col- 
ors, i.e., those found on opposite sides of the color 
wheel, are compatible. Some researchers have com- 
piled color templates based on the color wheel to help 
color designers. 

2. Multiple factor-based methods. These studies [15 ] [23 ] 
assess colors depending on hues and other factors, such 
as saturation, lightness, etc. Unlike the previous cate- 
gory of studies, this kind of work attempts to develop 
quantitative analysis based on controlled laboratory 
experiments. However, due to a lack of data, the re- 
sulting models sometimes contradict each other. 

3. Learning-based methods. Machine learning is a power- 
ful technique used to construct classification or scoring 



models based on a larger amount of training data. As 
discussed earlier, O 'Donovan et al. [18 conducted a 
pilot study to learn a classification/scoring model that 
quantitatively rates the compatibility of color themes 
using online color theme-rating data. 

Some other studies utilize the abovementioned color as- 
sessment theories to guide color selection and enhancement. 
For example, Cohen-or et al. 5 proposed a color harmo- 
nization method based on color wheel. Lalonde et al. [15] 
identified realistic images using color compatibility assess- 
ment results. There are online color assessment systems for 
Web pages such as [3]; however, these systems merely check 
local regions and focus on the color contrast. A system that 
assesses the overall colors does not currently exist. 

2.2 Color transfer 

Color transfer is an image editing technique that keeps 
the scene from a source image and applies the color style 
of a reference image [9 . Reinhard et al. [21] proposed the 
first color transfer algorithm based on the mean and stan- 
dard deviation of the color values in the source and refer- 
ence images. The method is efficient but in most cases, 
artifacts are created when the source and reference images 
have different color distributions. In response, researchers 
have developed numerous solutions that transfer the colors 
of pixels locally [20] [25]. Color transfer has been success- 
fully applied in photo appearance enhancement and movie 
post-processing. Little headway has been made on color en- 
hancement for Web pages, despite the fact that color is also 
one of the main concerns of Web design. 

2.3 Web appearance mining 

A large number of recent studies [2] [27] have given atten- 
tion to the analysis of visual appearance of Web pages. Cai 
et al. 2 introduced a visual-based page segment algorithm 
(VIPS) to extract the visual structure of a Web page. Wu 
et al. 27 proposed an automatic approach for determining 
whether or not a page is aesthetic. These results are summa- 
rized into a new Web mining division, i.e., Web appearance 
mining, as the (partial) analysis target is the appearances of 
Web pages. Web appearance mining focuses on discovering 
useful information (e.g., useful content blocks in 2 ) based 
on Web appearance. The current study also focuses on the 
appearance of a Web page. Therefore, part of the present 
work belongs to Web appearance mining. 

3. OVERVIEW OF THE FRAMEWORK 

Our proposed approach falls under a conventional ma- 
chine learning approach, involving extracting features and 
then learning the assessment model (a regression function). 




Figure 2: The steps for the construction of a color assessment model for Web pages. 



The proposed framework is summarized in Fig. 2. There 
are three main chahenges facing the model construction as 
discussed below. 

• Fixed part location. The location of the fixed part 
should be intuitively determined through source code 
analysis. However, integrating the processing of source 
code may be impractical for the assessment model be- 
cause the source codes of many Web pages are irregular 
and incomplete. 

• Color theme extraction. Color theme extraction aims 
to obtain the major colors of a Web page; therefore, 
minor colors should be excluded. 

• Assessment model learning. The training data should 
consist of color themes of Web pages and their respec- 
tive ratings. Unfortunately, little such training data 
exists. The online color theme-rating data used by [18 
can be utilized as the training data. However, accord- 
ing to the machine learning theory (analyzed in detail 
in Section 6), the online data set is inappropriate for 
use because online color themes and Web color themes 
have different distributions. 

To address the above challenges, a series of new algo- 
rithms are proposed. Processing source codes to locate the 
fixed part of a page can be avoided using a computer vi- 
sion method. In color theme extraction, a new clustering 
algorithm is introduced to discover major colors as well as 
to exclude color outliers. Transfer learning is introduced 
into model learning to reweight the online training data and 
adapt the data distribution to that of Web color themes. 
Likewise, we adopt a machine learning strategy called en- 
semble learning to improve the generalization capability of 
our color assessment model. 

Our proposed framework differs from the approach used 
in [18] in two aspects: (1) in our framework, a Web page 
should be preprocessed to locate the fixed part, and (2) a 
transfer learning strategy is used to utilize the plentiful la- 
beled resources of color themes available online. Sections 
4, 5 and 6 explain the technical details of the three core 
steps, namely, fixed part location, color theme extraction 
and model learning, respectively. 

4. FIXED PART LOCATION 

Before a Web page is processed, it is transformed into an 
image, which is referred to as the Web-page image. When 



there is no ambiguity, it is still called Web page for the 
sake of brevity. The process of locating the fixed parts is 
a Web structural mining problem. However, many existing 
Web structural mining algorithms rely heavily on the source 
codes, limiting their application, given that the source codes 
of many Web pages are non-standard and noisy. The visual- 
based page segment algorithm [2], one of the most famous 
Web page segmentation methods, is likewise difficult to use 
when the source codes of Web pages are too complex or 
non-standard. Hence, this study does not utilize these struc- 
tural analysis algorithms. Instead, we leverage computer vi- 
sion techniques to locate fixed parts, directly running on the 
transformed Web-page images. Three new method are pro- 
posed in this work, namely, block sampling-based, salience 
map-based, and image synthesize-based methods. These 
new methods are independent of source codes and are, there- 
fore, more available in real applications. Experiments show 
that the first method is more effective and efficient, so in 
our final work only this method is used and the latter two 
methods are introduced in the appendix part. 

4.1 Block sampling-based method 

Our method generates a set of image blocks from a se- 
quence of temporal Web-page images giving a URLQ as shown 
in Fig. 3. There are three steps involved. (1) Each of the 
temporal Web-page images is divided into Ni * A^2 blocks. 
Assuming that there are I temporal Web-page images, then 
for each block position we obtain I image blocks after di- 
vision. (2) We calculate the similarity between a block of 
the first Web-page image and the corresponding blocks of 
the successive temporal Web-page images. As a result, I-l 
similarities are obtained for each block. These similarities 
are then averaged and normalized into [0, 1]. The similarity 
between two image block is calculated based on the earth 
mover distance (EMD) [22 between the color histograms. 
Assume that the EMD is c?, then the similarity is defined 
as exp{—d). (3) The blocks are sampled according to their 
average similarities and then used to construct the set of 
image blocks. The sampling strategy ensures that colors in 
fixed parts are sampled with higher probabilities than those 
in temporal parts. 

The results of this method on an exemplar temporal Web- 
page image are shown in Fig. 4. The left image in Fig. 4 
shows the fixed part (not including the black areas) and 

^Temporal Web pages of a given URL can be obtained from 
http:/ /web. archive. org/ automatically. 
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Figure 3: The main steps of block sampling-based fixed part location. 
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Figure 4: A Web page's fixed part (left) by manually 
determined, and the sampled blocks of the Web page 
(right) by our method. The black boxes are not 
included. 



the right one shows the sampled blocks (not including the 
black areas). Both Ni and A^2 are set to 40. The blocks in 
the fixed part are more sampled than those in the temporal 
parts. The sampled blocks do not affect the major colors 
of the fixed part, although some blocks in the fixed part 
are also not sampled as well. Most temporal parts are not 
sampled (black areas). 

5. COLOR THEME EXTRACTION AND FEA- 
TURES 

Our goal for Web page color theme extraction is somewhat 
different from that of color theme extraction in O 'Donovan 
et al. [18], whose goal is to extract a color theme that can 
represent the colors while also being highly rated. This 
means that the color theme extracted by [18] may not be 
the most representative color theme for an image. In con- 
trast, our goal is restricted to the extraction of a color theme 
that can represent the colors of a Web page as much as pos- 
sible, whether or not its ratings are high. As such, color 
theme extraction becomes a pure data clustering problem. 
A Web page usually contains a large number of different 
colors. Some colors (e.g., range red colors in the page in 
Fig. 4 (left)) occupy a very small proportion. These colors 
have almost no effect on viewers' visual perception of the 
colors of whole pages. Therefore, they should be considered 
as color outliers. Nevertheless, conventional clustering tech- 
niques such as K-means are sensitive to outliers. To deal 
with this problem, we introduce the outlier-aware clustering 
algorithm into this work. 
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Figure 5: Results of the outlier-aware clustering al- 
gorithm on a synthetic data set ([lOj). 



5.1 Outlier-aware clustering 

Data clustering was used in [18 to extract color themes for 
images. Forero et al. 10 proposed the following clustering 
model which explicitly accounts for outliers: 



n K 

min y^y^iXi/c| 



- ruk 



(1) 



where Xi is a data point; Uik = 1 if belongs to k-ih clus- 
ter and Uik = otherwise; ruk represents a centroid; 6{) de- 
notes the indicator function; A > is an outlier-controlling 
parameter, such that the higher the value of A, the less the 
number of the points detected as outliers. When A ^ oo, 
all the data are deemed outlier-free and the outlier-aware 
clustering equals to K-means. The detailed steps to solve 
Eq. (1) can be referred to as the Robust K-means algorithm 
proposed by 10 . In this work, K equals 5 and A is set to 
70. 

Figure 5 shows the results of the above outlier-aware clus- 
tering algorithm on a synthetic data set (taken [10 from 
directly jj- The two clusters are correctly obtained while all 
the outlier data are correctly detected by the algorithm. 

5.2 Color theme extraction with fixed part lo- 
cation 



^In the supplementary material 

(|http://minus.com/lwD8XXA4mAsEY ) , we compare 
the above algorithm with K-means m color theme extrac- 
tion for images. Results suggest that the above algorithm 
can extract more representative colors from the original 
images. Please kindly check it. 



In this stage, five representative colors are obtained by ap- 
plying outlier-aware clustering to the colors of the fixed part 
of a Web page obtained by the proposed block sampling- 
based method. Pixels in some blocks may be sampled more 
than once, a weighted outlier-aware clustering algorithm is 
required. Equation. (1) can then be transformed: 

n 5 n 

min WjUik \ \xi - rrik - Oi ||^+a;^||o,||2 (2) 

where wi is the sampled times of the i-th pixel. The solu- 
tion of Eq. (2) is similar to that of Eq. (1) with a trivial 
modification. 

The five representative colors obtained should be com- 
bined from left to right to form a color theme. There are 
5! = 120 possible left to right combinations of the five colors. 
Given that the differences between two adjacent colors in a 
color theme affect the rating of the color theme [18^ , it is in- 
appropriate to randomly set the relative spatial positions of 
the five representative colors. A simple method is proposed 
to capture partial spatial information of the five representa- 
tive color. Once color clustering is completed, we calculated 
the pair-wise distances between clusters. The pair-wise dis- 
tance between two clusters is the average of the pair-wise 
distances of the positions of the pixels in the two clusters, 
thereby indicating the spatial information of the five colors. 
Finally, we selected an optimal color combination from the 
120 possible combinations, such that the relative positions 
agree with the pair- wise distances as much as possible. 

Figure 6 shows three exemplar results of the extracted 
color themes based on different methods: K-means (in (a)), 
outlier-aware method with salience map-based location (in 
(b)), outlier-aware method with image synthesize-based meth- 
odqj, and outlier-aware method with block sampling-based 
location (in (d)). The color themes from outlier- aware clus- 
tering + block sampling-based location are more insensitive 
to the colors of the temporal part than other three methods. 
For instance, in Fig. 6(2), only the color theme in (d) does 
not contain the purple color which does not appear in the 
fixed part. 

5.3 Features 

The features proposed in 18 are leveraged in this work 
with a slight modification. The mean values are weighted 
by the proportions of pixels in each of the five colors in the 
color theme obtained by the clustering algorithm. Finally, 
a feature vector with 334 dimensions is constructed to rep- 
resent the color theme of a Web page. This vector contains 
comprehensive factors related to color compatibility. 

6. ASSESSMENT MODEL LEARNING 
6.1 Problem description 

Let Pw denote the distribution of color themes of Web 
pages. Intuitively, Pw mainly depends on the distribution 
of Web pages. Let Po denote the distribution of online color 
themes created by color experts. Intuitively, Po mainly de- 
pends on color experts who created them. Then, we have 
Pw 7^ Po- Based on a machine learning perspective, if the 
distribution of future (test) data, i.e., Pw^ is different from 
training data, i.e., Po, the learned model usually has poor 
generalization capability. 

^Please refer to Appendix for more details. 
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Figure 6: Color theme extraction for Web pages: (a) 
K-means, (b) Outlier-aware clustering + Salience 
map-based fixed part location, (c) Outlier-aware 
clustering + Image synthesize-based fixed part lo- 
cation, and (d) Outlier-aware clustering + Block 
sampling-based fixed part location. 

The learning problem is then described as follows. Given 
a set of features (X — xi^ • • • ,xn) of color themes extracted 
from a collection of Web pages, the underlying distribution of 
X is Pw' There exists a set of features X = {xi, • • • , x^/ } 
of online color themes and the associated user ratings Y = 
{yi, • • • , y^/ }. The underlying distribution of X' is Po- Pw 
is different from Po - Then, how to learn a color assessment 
model for Web pages using X, X , and Y . 

6.2 Learning with transfer learning 

If the online and Web color themes are denoted as the 
source and target domains, respectively, the above problem 
is that learning an assessment model in the target domain 
with the help of the source domain which has a large amount 
of labeled training data. This is just a standard transfer 
learning problem. Huang et al. proposed an effective 
transfer learning algorithm that reweights the data in the 
source domain, in order to adapt the distribution of the re- 
weighted data to approach to the distribution of the target 



domain. Let nhe a radial basis kernel function between two 
samples. 

0^ = ^ E {x[ e x^ X, G X) 

$ can be viewed as a measure of the similarities among 
source data, and (j) measures the similarities among source 
and target data. Let (5 — — 1^ • • • , N') which denotes 

the weights of the samples. /3 can be obtained by solving 
the quadratic problem below. 



min|/3^$/3 - 0^/3 
s.t. /3i G [0,B] and 



i=l 



N' 



< N'e 



(4) 



B is set to 1000 and e is set to 1 in this work. Equation (4) 
shows that a larger similarity between a source data point 
and all the target data points results in a larger weight for 
that source data point. This is reasonable because domain 
adaptation aims to reweight the distribution of source data 
into target data. 

After determining /3, a color assessment function can then 
be obtained using a regression algorithm. This study uses 
the LASSO regression method because of its good perfor- 
mance reported in TS^. Combined with the weights, LASSO 
can be written as below. 

N 

mSxiy^ Pi{axi + h - yif + \\\a\\-^ (5) 

where a is the weight vector for features and 6 is a constant. 
The above equation can be solved by a convex optimization 
algorithm pT] . 

Algorithm 1 Ensemble-based transfer learning 

Input: X, X' ,y\ A, B, z, L; 
Output: a, h\ 

1: Calculate /3 based on Eq. (4). 

2: for (int / = 1; / < L; / + +) do 

3: Generate a new null training set Ti. 

4: for (int i = 1; i < A^' ; i + +) do 

5: Generate a random number d in [0, Max(/3)]. 

6: if d < then 

7: Insert {xi^yi) into Ti. 

8: end if 

9: end for 

10: Generate ai and hi by optimizing Eq. (5) on Ti. 
11: end for 

12: Return a — sum{ai) / L and h — sum{hi)/L. 



6.3 Improvement by ensemble learning 

Ensemble learning is another machine learning strategy 
that can improve the generalization capability of a model. 
Therefore, this section utilizes ensemble learning to further 
improve the generalization capability. Specifically, a new 
ensemble strategy is proposed by combining the weights 
obtained from transfer learning and bagging strategy 8 . 
A new training set is produced in each run of bagging by 
sampling the data in the source domain according to their 
weights obtained from Eq. (4) . The generated new training 
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Figure 7: The main steps of automatic color transfer 
for Web pages. 



set is then used to learn a regression function by LASSO 
shown in Eq. (5). Finally, all the learned functions are 
summed as the final scoring model for color assessment. 
Steps for the ensemble-based transfer learning are shown 
in Algorithm 1. 

7. APPLICATION TO COLOR TRANSFER 

The learned color assessment model in Section 6 is applied 
into color transfer for Web pages, a relatively new applica- 
tion. Given a source Web page and a reference page, the 
first step is to locate the fixed parts of both source and ref- 
erence pages. In our work, the fixed part of the reference 
page, whose colors we aim to transfer into, is obtained using 
the methods proposed in Section 4. For the source page, 
users manually identify the fixed part. 

Our color assessment and transfer for Web pages is sum- 
marized as an approach presented in Fig. 7. The approach 
automatically transfers the colors of an input Web page into 
the reference pages in the collections. The top- A/" transferred 
Web pages with higher color scores are then selected. Our 
proposed Web color transfer approach differs from conven- 
tional image color transfer in two aspects: (1) our approach 
includes a color assessment step, which can help users ob- 
tain high-quality transfer results, and (2) only the colors in 
the fixed parts of reference Web pages are considered, so the 
proposed fixed part location method should be used. On the 
contrary, in image color transfer, there is no a corresponding 
step. 

The three key components in the proposed application are 
detailed below. 

• Constructing a reference Web page collection. This 
step collects representative Web pages for color trans- 
fer use. 

• Transferring colors based on the Web page database. 
This step takes turns to select one page in the Web 
page collection as the target Web page. The fixed part 
of the target page is extracted and color transfer is 
then performed. 

• Ranking the transferred pages based on color assess- 
ment. This step orders the transferred Web pages. A 
preferred Web page list can be obtained according to 
the color scores of the transferred pages. Finally, Top- 
N pages are output where N can be set by the users. 

The above components are explained in the four points: 

1. In color transfer, it is unnecessary to take all the pages 
in the reference collection as reference pages. A more 




Figure 8: The regression errors of the three competing assessment models for Web color assessment. 




Figure 9: The regression errors of the three competing assessment models without or with PCA. 



feasible solution is to only take similar Web pages as 
references. The similarity among pages can be defined 
according to application context. For example, if two 
Web pages are both the homepages of enterprisers, 
they are considered similar. The similarity can also be 
measured by their structures. Web content and struc- 
tural mining techniques may help in the measurement. 

2. Color transfer can be performed using any of the exist- 
ing algorithms such as those of [21 and [20 . The cur- 
rent study does not intend to explore the optimal algo- 
rithm among the existing color transfer algorithms. In 
our work, we directly leverage the work proposed by 
Pitie et al. [2007] as the basic color transfer algorithm. 

3. Ideally, in the step of color assessment rank, a 
visual quality evaluation algorithm should be 
used to assess the transferred pages. However, 
studies on visual quality for Web pages are still in in- 
fancy. Compared to the large number of online color 
rating data, the available rating data for the visual 
quality of Web pages is very limited [27]. Hence, in 
this study, we rate the color transfer results still based 
on our color assessment model which is learned from 
online color rating data. 

4. The elements of the Web page collection can be any 
visual objects, such as images, paint, etc. The kind of 
visual presents used in the collection depends on the 
application context or user choices. 

8. EXPERIMENTS 

This section aims (1) to evaluate the proposed framework 
for the construction of an assessment model for Web page 
colors, and (2) to investigate whether the proposed color 
transfer approach is useful or not. Therefore, the proposed 
color theme extraction and color assessment model construc- 
tion methods are evaluated in Sections 8.1 and 8.2, respec- 
tively; several examples of color transfer for Web pages with 
an online user study are present in Section 8.3. 



In color theme extraction and color assessment, we chose 
homepages as our test data and collected 500 homepages, 
mainly from companies, universities, governments, and per- 
sonal sites. A total of nine graduate students from the lab- 
oratory, specifically six males and three females, are invited 
to label the collected pages. Each participant is allowed to 
view one page within 5 seconds and assess the color design 
of the page from the five rating scores (1, 2, 3, 4, and 5). 
Here, "1" means very bad, and "5" means very good. Af- 
ter human labeling, each page obtained nine scores. The 
average of these scores is taken as the color score. 

8.1 Results on color theme extraction 

The four color theme extraction methods (K-means, outlier- 
aware clustering + salience map-based location, outlier-aware 
clustering + image synthesized-based location, and outlier- 
aware clustering + block sampling-based location) shown in 
Fig. 6 are compared on randomly selected 100 collected test 
Web pages, whose fixed parts are manually determined. Let 
Xi be a pixel belonging to the fixed part of a Web page, and 
M — {mi, 7712, ma, m4, m^} be the color theme of that page 
obtained by a extraction method. M can be evaluated by 
the average within-cluster sum (aCS) of squares below. 

1 

k—l xi^mj{mj^M) 

where Xi G rrij means that rrij is the closest color to the i-ih 
pixel among the five colors in the color theme. A lower aCS 
value indicates a better color theme (M). 

On each page, each method is repeated 5 times as the 
underlying clustering algorithm is sensitive to initial points. 
The average of aCS on the 100 pages for the four competing 
methods are 6.357, 5.2663, 4.943, and 4.245 in the CIELab 
space, respectively. Our introduced outlier-aware method 
with block sampling-based fixed part location achieves the 
minimum mean of aCS values, indicating its best perfor- 
mance among the competing methods. The following ex- 
periments will merely use the sample-based method in the 




Figure 10: The original Web page and three transferred Web pages. In our online user study, the average 
rating scores by 110 users of the two transferred pages (T2 and T3) are higher than that of the original page. 



fixed part location. 

8.2 Results on Web color assessment 

The assessment models below are compared. 

• The color assessment model proposed by 18 , which is 
denoted as CM in this paper, and is directly learned 
from the online color theme-rating data; 

• The color assessment model constructed by the pro- 
posed framework (shown in Fig. 2), which is based on 
the transfer learning shown in Eq. (5), and is denoted 
as TranCM. 

• The color assessment model constructed by the pro- 
posed framework (shown in Fig .2), which is based on 
the ensemble-based transfer learning shown in Algo- 
rithm 1, and is denoted as EnsemCM. 

A source data set and a target data set should be prepared 
for the learning step in the construction of the TranCM and 
EnsemCM models. All the three online data sets (Kuler, 
COLOURLovers, and Mturk) used in 18 are used to cre- 
ate the source data. We randomly selected 3000 samples 
from each online data set to create a new data set as the 
source data. For the target data set, we collected 500 * n 
(n=l, 2, 3, 4, 5, and 6) Web pages based on the Alexa rank- 
ings 1 by deleting pages that are duplicated with nearly 
the same URLs (e.g., www.google.com and www.google.gr). 
After compiling the source data set and target data set and 
then extracting their color-theme features, the TranCM and 
EnsemCM models can be obtained using the corresponding 
learning algorithms. 

Each of the three competing color assessment model is 
run on the features of the 500 collected test Web pages to 
score these test pages. The corresponding values of residual 
sum-of-square error (RSSE) are recorded. The above process 
is then repeated five times, after which the average RSSE 
values are recorded. The parameters are searched by cross 
validation. For EnsemCM, L is set to 50. 

Figure 8 shows the results of the three competing models 
on the 500 Web pages in terms of residual square sum of er- 
ror (RSSE). Each image in Fig. 8 represents the results when 
the source data are created by the Kuler, COLOURLovers, 
and Mturk, respectively. A total of 500*n (n=l, 2, 3, 4, 5, 
and 6) Web pages are used as the target data. When the 
number of target data is increased, the RSSE values (re- 
gression errors) are decreased. This is reasonable because 



Table 1: Top 10 most correlated features 



1 


The plan-fitting features of the lightness in 
CIELab 


positive 


2 


The mean of CIELab' s b dimension 


negative 


3 


The minimum difference between adjacent 
colors in terms of CHSV's V dimension 


positive 


4 


The second- largest difference between adja- 
cent colors in terms of HSV's S dimension 


negative 


5 


The mean of RGB's B dimension 


negative 


6 


The sum-of-square error of the fitted 2D 
plane in CHSV 


negative 


7 


I'he largest difference between adjacent col- 
ors in terms of HSV's H dimension 


negative 


8 


The second-largest difference between adja- 
cent colors in terms of CIELAB 's L dimen- 
sion 


positive 


9 


The minimum difference between adjacent 
colors in terms of CHSV's H dimension 


negative 


10 


The second- largest difference between adja- 
cent colors in terms of RGB's B dimension 


negative 



the increasing number of target data causes the distribution 
of target data to more affect the weights of the source data. 
The proposed transfer learning based model (TranCM) out- 
performs the existing model (CM), while the proposed en- 
semble transfer learning based model (EnsemCM) achieves 
the best results over the three data sets. These results sug- 
gest that transferring the online color data to assess the 
colors of Web pages is helpful. The introduced ensemble 
learning is useful in the transfer. 

As some features are correlated, thus, we further applied 
principal component analysis (PCA) [3 , which is a feature 
reduction technique, on the involved data sets in order to 
de- correlate the features. Figure 9 shows the results of the 
performances of the competing models when PCA is used. A 
total of 3000 samples are randomly selected for each online 
data set. Ah 3000 samples are used for Web data. The RSSE 
values of all the competing models are reduced, indicating 
that de-correlation can improve the assessment performance. 

The correlation between features and color ratings of Web 
pages is also investigated on the 500 pages. The top 10 fea- 
tures that are most correlated to average color ratings are 
listed in Table 1. The lightness features are among the most 
important features for color assessment, which is consistent 
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Figure 11: The original Web page and three transferred Web pages. In our online user study, the average 
rating scores by 110 users of the two transferred pages (Tl and T2) are higher than that of the original page. 
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Figure 12: The original Web page and three transferred Web pages. In our online user study, the average 
rating scores by 110 users of the two transferred pages (Tl and T2) are higher than that of the original page. 



with the conclusions in 18 . The blue color features are also 
important. The fourth feature is the brightness of the colors 
of a Web page. The minimum differences between adjacent 
colors are positively correlated to the color ratings. This 
is reasonable because a page with extreme lower difference 
between adjacent colors may result in poor readability. The 
7th to 10th largest correlated features indicate that the dif- 
ferences between adjacent colors are important. The above 
observations indicate the differences between color ratings 
for Web pages and images. 

8.3 Case study for Web color transfer 

As described earlier, the reference Web pages in Web color 
transfer can be any forms of visual objects, including Web 
pages and images. We transfer colors for ten Web pages 
from the collected Web pages and some cartoon images. To 
assess the transferred Web pages, we launched a user study 
through the online user-study website in China 7 . In our 
user study, each user was invited to rate the colors of ten 
groups of Web pages (including the pages in Figures 10-12). 
In each group, there are four pages that are made up of 
an original Web page and its three transferred result pages. 
Users rate each page in the range of "1" to "5". A total of 
110 users (53 males and 57 females) rated the Web pages. 
About 76.4% of the users surf the Internet more than 8 hours 
a week. About 94.54% are in the age of 15 and 40. 

According to the user study, 56.67% of the transferred 
pages have higher rating scores compared with the original 
pages. Figures 10-12 show three color transfer examples. 
The leftmost page in each example is the original page, while 
the remaining three pages (i.e., Tl, T2, and T3) are the 
transferred results whose scores are ordered top 3 by the 
learned color assessment model. Colors in some transferred 
pages have higher rating scores. For example, almost all the 
users in the age of 15 and 20 gave the highest scores to the 



Tl page in Fig. 12. 

With our automatic Web color transfer and assessment 
framework, designers can obtain many Web-page images 
with different colors at very low costs. Although these im- 
ages are not real Web pages, there are no obvious difference 
in color observation. These Web-page images demonstrate 
different color tones and perception to designers, which can 
provide new insights or inspiration to Web designers. 

8.4 Limitations 

The primary limitation is that the features are merely the 
color theme instead of the original colors of the fixed part. 
The secondary limitation is that this study assesses the col- 
ors of a Web page by simply assessing the compatibility of 
the color theme of that page. This simple strategy ignores 
the interactive nature of a Web page. For example, read- 
ability is a key property of Web pages J^. However, it 
is ignored in this work, causing some poor-readability Web 
pages to be assessed with higher color scores. Another ma- 
jor limitation is the manually determination of the temporal 
parts of source Web pages. The manual operation may hin- 
der the applications of our proposed Web color transfer and 
assessment approach in actual use. 

9. CONCLUSIONS 

Colors are very important to Web pages. This paper has 
investigated the assessment of Web page colors. A frame- 
work of assessment model construction has been proposed 
by learning existing online color theme-rating data for Web 
color assessment. Theories and techniques from Web min- 
ing, computer graphics, computer vision, and machine learn- 
ing are introduced to address the main challenges in the 
key steps of the model construction (i.e., fixed part loca- 
tion, color theme extraction, and model learning). The con- 
structed color assessment model is then applied into a new 



application, namely, color transfer for Web pages. Experi- 
ments and online user study results suggest the effectiveness 
of our proposed methodologies. 

As a cross-discipline topic, our findings can promote the 
intersections between the involved disciplines. Color transfer 
between Web pages brings new changelings and may encour- 
age the combination of more computer graphic techniques 
and Web mining techniques to Web design. 

10. REFERENCES 

[1] Alexa. http://www.alexa.com/topsites, 11/30/2011. 
[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Vips: a 
vision-based page segmentation algorithm. Microsoft 
Technical Report, (MSR-TR-2003-79), 2003. 
[3] L. J. Cao, K. S. Chua, and W. K. Chong. A 

comparison of pea, kpca and ica for dimensionality 
reduction in support vector machine. Neurocomputing, 
55(l-2):321-336, 2003. 
[4] Checkmy colors, http://www.checkmycolours.com/. 
[5] D. Cohen-Or, O. Sorkine, R. Gal, T. Leyvand, and 
Y.-Q. Xu. Color harmonization. ACM Trans. Graph., 
25(3):624-630, 2006. 
[6] C. K. Coursaris, E. Lansing, S. J. Swierenga, and 
E. Wat rail. An empirical investigation of color 
temperature and gender effects on web aesthetics. 
Journal of Usability Studies, 3(3): 103-1 17, 2008. 
[7] Diaocha. http://www.diaochapai.com. 
[8] T. G. Dietterich. Ensemble methods in machine 
learning. In International Workshop on Multiple 
Classifier Systems, pages 1-15, 2000. 
[9] W. Dong, G. Bao, X. Zhang, and J.-C. Paul. Fast 
local color transfer via dominant colors mapping. In 
ACM SICCRAPH ASIA 2010 Sketches, pages 
46:1-46:2. ACM, 2010. 
10] P.-A. Forero, V. Kekatos, and G.-B. Giannakis. 

Robust clustering using outlier-sparsity regularization. 
IEEE Trans. Sig. Proc, 60(8):4163-4177, 2012. 
11] J. Friedman, T. Hastie, and R. Tibshirani. 

Regularization paths for generalized linear models via 
coordinate descent, volume 33, pages 1-22, 2009. 
12] J. Goethe. Theory of Colors. 1810. 
13] J. Huang and et al. Correcting sample selection bias 

by unlabeled data. In NIPS, pages 601-608, 2006. 
14] I. Kondratova and I. Goldfarb. Color your website: use 
of colors on the web. In UI-HCII, pages 123-132, 2007. 
15] J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother, 
J. Winn, and A. Criminisi. Photo clip art. ACM 
Trans. Graph., 26(3), July 2007. 
16] J. Ling and P. Van Schaik. The effect of text and 
background colour on visual search of web pages. 
Displays, 23(5):223-230, 2002. 
17] T. Liu, J. Sun, N. Zheng, X. Tang, and H. yeung 
Shum. Learning to detect a salient object. In IEEE 
CVPR, pages 1-8, 2007. 
18] P. O 'Donovan, A. Agarwala, and A. Hertzmann. Color 
compatibility from large datasets. ACM Trans. 
Graph., 30(4):63:1-63:12, Aug. 2011. 
19] S.-E. Palmer and K.-B. Schloss. An ecological valence 
theory of human color preference. PNAS, 
107(19) :8887-8882, 2010. 
20] F. Pitie, A. C. Kokaram, and R. Dahyot. Automated 



colour grading using colour distribution transfer. 
Comput. Vis. Image Underst, 107(1-2):123-137, 2007. 

[21] E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley. 
Color transfer between images. IEEE Comput. Graph. 
AppL, 21(5):34-41, 2001. 

[22] Y. Rubner, C. Tomasi, and L. J. Guibas. The earth 
mover's distance as a metric for image retrieval. Int. 
J. Comput. Vision, 40(2):99-121, 2000. 

[23] K. B. Schloss and S. E. Palmer. Aesthetic response to 
color combinations: preference, harmony, and 
similarity. Attention perception psychophysics, 
73(2):551-571, 2011. 

[24] K.-E. Schmidta, Y. Liua, and S. Sridharanb. Webpage 
aesthetics, performance and usability: Design variables 
and their effects. Ergonomics, 52:631-643, 2009. 

[25] Y.-W. Tai, J. Jia, and C.-K. Tang. Local color transfer 
via probabilistic segmentation by expectation- 
maximization. In IEEE CVPR, pages 747-754, 2005. 

[26] L. Thorlacius. The role of aesthetics in web design. 
Nordicom Review, 28(l):63-76, 2007. 

[27] O. Wu, Y. Chen, B. Li, and W. Hu. Evaluating the 
visual quality of web pages using a computational 
aesthetic approach. In WSDM, pages 337-346, 2011. 

APPENDIX 

A. SALIENCE MAP-BASED FIXED PART 
LOCATION 

This method reweights the proportion of colors according 
to the salience map output by salency detection [1 7 . The 
temporal parts usually have higher saliency weights, because 
temporal areas usually show image content or flash adver- 
tisement designed to attract users' attentions more than the 
fixed areas. On the contrary, the saliency of fixed parts is 
usually low. Based on these observations, a saliency-map 
based algorithm is proposed to locate the fixed part of a 
Web page. A pixel is considered as being in the fixed part if 
its salience value is below a certain threshold. This method 
is very sensitive to the results of saliency map detection. For 
many Web pages, areas of the fixed part are also detected 
as salency areas, and thus this method fails. 

B. IMAGE SYNTHESIZE-BASED FIXED PART 
LOCATION 

The colors of the temporal part are changed constantly, 
whereas those of the fixed parts remain relatively unchanged. 
Thus, for a specific Web page, we can first collect a num- 
ber of temporal Web-page images at different times with the 
same URL of that page. Then the collected Web-page im- 
ages are synthesized into a single image. As a consequence, 
we can extract color themes based on the synthesized image 
as the negative effects of colors from temporal parts have 
been reduced. The computational complexy of this method 
is high as the synthesized image has large size. 



